10 research outputs found
Estimating individual treatment effects under unobserved confounding using binary instruments
Estimating individual treatment effects (ITEs) from observational data is
relevant in many fields such as personalized medicine. However, in practice,
the treatment assignment is usually confounded by unobserved variables and thus
introduces bias. A remedy to remove the bias is the use of instrumental
variables (IVs). Such settings are widespread in medicine (e.g., trials where
compliance is used as binary IV). In this paper, we propose a novel, multiply
robust machine learning framework, called MRIV, for estimating ITEs using
binary IVs and thus yield an unbiased ITE estimator. Different from previous
work for binary IVs, our framework estimates the ITE directly via a pseudo
outcome regression. (1) We provide a theoretical analysis where we show that
our framework yields multiply robust convergence rates: our ITE estimator
achieves fast convergence even if several nuisance estimators converge slowly.
(2) We further show that our framework asymptotically outperforms
state-of-the-art plug-in IV methods for ITE estimation. (3) We build upon our
theoretical results and propose a tailored deep neural network architecture
called MRIV-Net for ITE estimation using binary IVs. Across various
computational experiments, we demonstrate empirically that our MRIV-Net
achieves state-of-the-art performance. To the best of our knowledge, our MRIV
is the first machine learning framework for estimating ITEs in the binary IV
setting shown to be multiply robust
Partial Counterfactual Identification of Continuous Outcomes with a Curvature Sensitivity Model
Counterfactual inference aims to answer retrospective ''what if'' questions
and thus belongs to the most fine-grained type of inference in Pearl's
causality ladder. Existing methods for counterfactual inference with continuous
outcomes aim at point identification and thus make strong and unnatural
assumptions about the underlying structural causal model. In this paper, we
relax these assumptions and aim at partial counterfactual identification of
continuous outcomes, i.e., when the counterfactual query resides in an
ignorance interval with informative bounds. We prove that, in general, the
ignorance interval of the counterfactual queries has non-informative bounds,
already when functions of structural causal models are continuously
differentiable. As a remedy, we propose a novel sensitivity model called
Curvature Sensitivity Model. This allows us to obtain informative bounds by
bounding the curvature of level sets of the functions. We further show that
existing point counterfactual identification methods are special cases of our
Curvature Sensitivity Model when the bound of the curvature is set to zero. We
then propose an implementation of our Curvature Sensitivity Model in the form
of a novel deep generative model, which we call Augmented Pseudo-Invertible
Decoder. Our implementation employs (i) residual normalizing flows with (ii)
variational augmentations. We empirically demonstrate the effectiveness of our
Augmented Pseudo-Invertible Decoder. To the best of our knowledge, ours is the
first partial identification model for Markovian structural causal models with
continuous outcomes
Sharp Bounds for Generalized Causal Sensitivity Analysis
Causal inference from observational data is crucial for many disciplines such
as medicine and economics. However, sharp bounds for causal effects under
relaxations of the unconfoundedness assumption (causal sensitivity analysis)
are subject to ongoing research. So far, works with sharp bounds are restricted
to fairly simple settings (e.g., a single binary treatment). In this paper, we
propose a unified framework for causal sensitivity analysis under unobserved
confounding in various settings. For this, we propose a flexible generalization
of the marginal sensitivity model (MSM) and then derive sharp bounds for a
large class of causal effects. This includes (conditional) average treatment
effects, effects for mediation analysis and path analysis, and distributional
effects. Furthermore, our sensitivity model is applicable to discrete,
continuous, and time-varying treatments. It allows us to interpret the partial
identification problem under unobserved confounding as a distribution shift in
the latent confounders while evaluating the causal effect of interest. In the
special case of a single binary treatment, our bounds for (conditional) average
treatment effects coincide with recent optimality results for causal
sensitivity analysis. Finally, we propose a scalable algorithm to estimate our
sharp bounds from observational data.Comment: Accepted at NeurIPS 202
Normalizing Flows for Interventional Density Estimation
Existing machine learning methods for causal inference usually estimate
quantities expressed via the mean of potential outcomes (e.g., average
treatment effect). However, such quantities do not capture the full information
about the distribution of potential outcomes. In this work, we estimate the
density of potential outcomes after Interventional Normalizing Flows.
Specifically, we combine two normalizing flows, namely (i) a teacher flow for
estimating nuisance parameters and (ii) a student flow for a parametric
estimation of the density of potential outcomes. We further develop a tractable
optimization objective via a one-step bias correction for an efficient and
doubly robust estimation of the student flow parameters. As a result our
Interventional Normalizing Flows offer a properly normalized density estimator.
Across various experiments, we demonstrate that our Interventional Normalizing
Flows are expressive and highly effective, and scale well with both sample size
and high-dimensional confounding. To the best of our knowledge, our
Interventional Normalizing Flows are the first fully-parametric, deep learning
method for density estimation of potential outcomes
Fair Off-Policy Learning from Observational Data
Algorithmic decision-making in practice must be fair for legal, ethical, and
societal reasons. To achieve this, prior research has contributed various
approaches that ensure fairness in machine learning predictions, while
comparatively little effort has focused on fairness in decision-making,
specifically off-policy learning. In this paper, we propose a novel framework
for fair off-policy learning: we learn decision rules from observational data
under different notions of fairness, where we explicitly assume that
observational data were collected under a different potentially discriminatory
behavioral policy. For this, we first formalize different fairness notions for
off-policy learning. We then propose a neural network-based framework to learn
optimal policies under different fairness notions. We further provide
theoretical guarantees in the form of generalization bounds for the
finite-sample version of our framework. We demonstrate the effectiveness of our
framework through extensive numerical experiments using both simulated and
real-world data. Altogether, our work enables algorithmic decision-making in a
wide array of practical applications where fairness must be ensured.Comment: Revised versio
Bayesian Neural Controlled Differential Equations for Treatment Effect Estimation
Treatment effect estimation in continuous time is crucial for personalized
medicine. However, existing methods for this task are limited to point
estimates of the potential outcomes, whereas uncertainty estimates have been
ignored. Needless to say, uncertainty quantification is crucial for reliable
decision-making in medical applications. To fill this gap, we propose a novel
Bayesian neural controlled differential equation (BNCDE) for treatment effect
estimation in continuous time. In our BNCDE, the time dimension is modeled
through a coupled system of neural controlled differential equations and neural
stochastic differential equations, where the neural stochastic differential
equations allow for tractable variational Bayesian inference. Thereby, for an
assigned sequence of treatments, our BNCDE provides meaningful posterior
predictive distributions of the potential outcomes. To the best of our
knowledge, ours is the first tailored neural method to provide uncertainty
estimates of treatment effects in continuous time. As such, our method is of
direct practical value for promoting reliable decision-making in medicine
Counterfactual Fairness for Predictions using Generative Adversarial Networks
Fairness in predictions is of direct importance in practice due to legal,
ethical, and societal reasons. It is often achieved through counterfactual
fairness, which ensures that the prediction for an individual is the same as
that in a counterfactual world under a different sensitive attribute. However,
achieving counterfactual fairness is challenging as counterfactuals are
unobservable. In this paper, we develop a novel deep neural network called
Generative Counterfactual Fairness Network (GCFN) for making predictions under
counterfactual fairness. Specifically, we leverage a tailored generative
adversarial network to directly learn the counterfactual distribution of the
descendants of the sensitive attribute, which we then use to enforce fair
predictions through a novel counterfactual mediator regularization. If the
counterfactual distribution is learned sufficiently well, our method is
mathematically guaranteed to ensure the notion of counterfactual fairness.
Thereby, our GCFN addresses key shortcomings of existing baselines that are
based on inferring latent variables, yet which (a) are potentially correlated
with the sensitive attributes and thus lead to bias, and (b) have weak
capability in constructing latent representations and thus low prediction
performance. Across various experiments, our method achieves state-of-the-art
performance. Using a real-world case study from recidivism prediction, we
further demonstrate that our method makes meaningful predictions in practice
Reliable Off-Policy Learning for Dosage Combinations
Decision-making in personalized medicine such as cancer therapy or critical
care must often make choices for dosage combinations, i.e., multiple continuous
treatments. Existing work for this task has modeled the effect of multiple
treatments independently, while estimating the joint effect has received little
attention but comes with non-trivial challenges. In this paper, we propose a
novel method for reliable off-policy learning for dosage combinations. Our
method proceeds along three steps: (1) We develop a tailored neural network
that estimates the individualized dose-response function while accounting for
the joint effect of multiple dependent dosages. (2) We estimate the generalized
propensity score using conditional normalizing flows in order to detect regions
with limited overlap in the shared covariate-treatment space. (3) We present a
gradient-based learning algorithm to find the optimal, individualized dosage
combinations. Here, we ensure reliable estimation of the policy value by
avoiding regions with limited overlap. We finally perform an extensive
evaluation of our method to show its effectiveness. To the best of our
knowledge, ours is the first work to provide a method for reliable off-policy
learning for optimal dosage combinations.Comment: Accepted at NeurIPS 202
Estimating average causal effects from patient trajectories
In medical practice, treatments are selected based on the expected causal
effects on patient outcomes. Here, the gold standard for estimating causal
effects are randomized controlled trials; however, such trials are costly and
sometimes even unethical. Instead, medical practice is increasingly interested
in estimating causal effects among patient (sub)groups from electronic health
records, that is, observational data. In this paper, we aim at estimating the
average causal effect (ACE) from observational data (patient trajectories) that
are collected over time. For this, we propose DeepACE: an end-to-end deep
learning model. DeepACE leverages the iterative G-computation formula to adjust
for the bias induced by time-varying confounders. Moreover, we develop a novel
sequential targeting procedure which ensures that DeepACE has favorable
theoretical properties, i.e., is doubly robust and asymptotically efficient. To
the best of our knowledge, this is the first work that proposes an end-to-end
deep learning model tailored for estimating time-varying ACEs. We compare
DeepACE in an extensive number of experiments, confirming that it achieves
state-of-the-art performance. We further provide a case study for patients
suffering from low back pain to demonstrate that DeepACE generates important
and meaningful findings for clinical practice. Our work enables practitioners
to develop effective treatment recommendations based on population effects.Comment: Accepted at AAAI 202
Causal Transformer for Estimating Counterfactual Outcomes
Estimating counterfactual outcomes over time from observational data is
relevant for many applications (e.g., personalized medicine). Yet,
state-of-the-art methods build upon simple long short-term memory (LSTM)
networks, thus rendering inferences for complex, long-range dependencies
challenging. In this paper, we develop a novel Causal Transformer for
estimating counterfactual outcomes over time. Our model is specifically
designed to capture complex, long-range dependencies among time-varying
confounders. For this, we combine three transformer subnetworks with separate
inputs for time-varying covariates, previous treatments, and previous outcomes
into a joint network with in-between cross-attentions. We further develop a
custom, end-to-end training procedure for our Causal Transformer. Specifically,
we propose a novel counterfactual domain confusion loss to address confounding
bias: it aims to learn adversarial balanced representations, so that they are
predictive of the next outcome but non-predictive of the current treatment
assignment. We evaluate our Causal Transformer based on synthetic and
real-world datasets, where it achieves superior performance over current
baselines. To the best of our knowledge, this is the first work proposing
transformer-based architecture for estimating counterfactual outcomes from
longitudinal data